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Abstract 

This paper describes the £T>£++ system and the research advances that have enabled its 
design and development. We begin by discussing the new nonmonotonic and nondetermin- 
istic constructs that extend the functionality of the £T>£++ language, while preserving 
its model-theoretic and fixpoint semantics. Then, we describe the execution model and 
the open architecture designed to support these new constructs and to facilitate the in- 
tegration with existing DBMSs and applications. Finally, we describe the lessons learned 
by using £T>£++ on various tested applications, such as middleware and datamining. 



1 Introduction 



The CDC++ system, which was completed at UCLA in the summer of 2000, con- 
cludes a research project that was started at MCC in 1989 in response of the lessons 
learned from of its predecessor, the CDC system. The CDC system, which was com- 



pleted 1988, featured many technical advances in language design (Naqvi & Tsur, 



1989), and implementation techniques (Chimcnti et at, 1990). However, its deploy- 
ment in actual applications (Tsur, 1990a; Tsur, 1990b) revealed many problems 
and needed improvements, which motivated the design of the new CDC-\ — I- sys- 
tem. Many of these problems were addressed in the early versions of the CDC++ 
prototype that were built at MCC in the period 1990-1993; but other problems, 
particularly limitations due to the stratification requirement, called for advances 
on nonmonotonic semantics, for which solutions were discovered and incorporated 
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into the system over time — till the last version (Version 5.1) completed at UCLA 
in the summer of 2000. 

In this paper, we will concentrate on the most innovative and distinctive features 
of CD C++ ^ which can be summarized as follows: 

• Its new language constructs designed to extend the expressive power of the 
language, by allowing negation and aggregates in recursion, while retaining 
the declarative semantics of Horn clauses, 

• Its execution model designed to support (i) the new language constructs, (ii) 
data-intensive applications via tight coupling with external databases, and 
(iii) an open architecture for extensibility to new application domains, 

• Its extensive application testbed designed to evaluate the effectiveness of de- 
ductive database technology on data intensive applications and new domains, 
such as middleware and data mining. 



2 The Language 

A challenging research objective pursued by CD C++ was that of extending the 
expressive power of logic-based languages beyond that of CDC while retaining a 
fully declarative model-theoretic and fixpoint semantics. As many other deduc- 



tive database systems designed in the 80s (Minker, 1996), the old CDC system 
required programs to be stratified with respect to nonmonotonic constructs such as 
negation and set aggregates ( Ramakrishnan fc Ullman, 1995| ). While stratification 



represented a major step forward in taming the difficult theoretical and practical 
problems posed by nonmonotonicity in logic programs, it soon became clear that 
it was too restrictive for many applications of practical importance. Stratification 
makes it impossible to support efficiently even basic applications, such as Bill of 
Materials and optimized graph-traversals, whose procedural algorithms express sim- 
ple and useful generalizations of transitive closure computations. Thus, deductive 
database researchers have striven to go beyond stratification and allow negation 
and aggregates in the recursive definitions of new predicates. CDC++ provides a 
comprehensive solution to this complex problem by the fully integrated notions of 
(i) choice, (ii) User Defined Aggregates (UDAs), and (iii) XY-stratification. Now, 
XY-stratification generalizes stratification to support negation and (nonmonotonic) 
aggregates in recursion. However, the choice construct (used to express functional 
dependency constraints) defines mappings that, albeit nondeterministic, are mono- 
tonic and can thus be used freely in recursion. Moreover, this construct makes it pos- 
sible to provide a formal semantics to the notion of user-defined aggregates (UDAs), 
and to identify a special class of UDAs that are monotonic ( Zaniolo fc Wang, 1999| ): 



therefore, the CDC^ — h compiler recognizes monotonic UDAs and allows their unre- 
stricted usage in recursion. In summary, CDCA — V provides a two-prong solution to 
the nonmonotonicity problem, by (i) enlarging the class of logic-based constructs 
that are monotonic (with constructs such as choice and monotonic aggregates), 
and (ii) supporting XY-stratification for hard-core nonmonotonic constructs, such 
as negation and nonmonotonic aggregates. 
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These new constructs of CVC++ are fully integrated with all other constructs, 
and easy to learn and use. Indeed, a user needs not to know abstract semantic 
concepts, such as stable models or well-founded models; instead, the user only 
needs to follow simple syntactic rules — the same rules that are then checked by 
the compiler. In fact, the semantic well-formedness of CDC++ programs can be 
checked at compile time — a critical property of stratified programs that was lost in 
later extensions, such as modular stratification ( Ross, 1994 ). These new constructs 
are described next. 



2.1 Functional Constraints 

Say that we have a database containing the relations student (Name, Major, Year) 
and prof essor(Name, Major). In fact, let us take a toy example that only has the 
following factsQ 



student('jimBlack', ee, senior). prof essor(ohm, ee). 

prof essor(bell, ee). 

Now, the rule is that the major of a student must match his/her advisor's major 
area of specialization. Then, eligible advisors can be computed as follows: 



elig_adv(S, P) <— student(S, Majr, Year), prof essor(P, Majr). 
This yields 



elig_adv('JimBlack', ohm). 
elig_adv('JimBlack',bell). 

But, since a student can only have one advisor, the goal choice((S), (P)) must 
be added to our rule to force the selection of a unique advisor for each student: 

Example 2.1 

Computation of unique advisors by a choice rule 

actual_adv(S, P) <— student(S, Majr, Yr), prof essor(P, Majr), 
choice((S), (P)). 

The goal choice((S), (P)) can also be viewed as enforcing a junctional dependency 
(FD) S — > P on the results produced by the rule; thus, in actual_adv, the second 
column (professor name) is functionally dependent on the first one (student name) . 
Therefore, we will refer to S and P, respectively, as the left side and the right side 
of this FD, and of the choice goal defining it. The right side of a choice goal cannot 



1 We follow the standard convention of using upper case initials to denote variables; lower case 
initials and strings enclosed in quotes denote constants. 
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be empty, but its left side can be empty, denoting that all tuples produced must 
share the same values for the right side attributes. 



The result of the rule of Example 2.1 is nondeterministic: it can either return 



a singleton relation containing the tuple ('JimBlack', ohm), or one containing the 
tuple ('JimBlack', bell). 

A program where the rules contain choice goals is called a choice program. The 
semantics of a choice program P can be defined by transforming P into a program 
with negation, foe(P), called the first order equivalent of P. Now, foe(P) exhibits 
a multiplicity of stable models, each obeying the FDs defined by the choice goals; 
each such stable model corresponds to an alternative set of answers for P and is 



called a choice model for P. The first order equivalent of Example 2.1 is as follows 



Example 2.2 
The first order equivalent for Example 2.1 



actual_adv(S,P) < 
chosen(S,P) <— 
diffChoice(S,P) 



student(S, Majr, Yr), prof essor(P, Majr) 
chosen(S, P). 

student(S, Majr, Yr), prof essor(P, Majr) 

-ndiffChoice(S,P). 

chosen(S,P'), P ^ P'. 



This can be read as a statement that a professor will be assigned to a student 
whenever a different professor has not been assigned to the same student. In general, 
foe(P) is defined as follows: 

Definition 2.1 

Let P denote a program with choice rules: its first order equivalent foe(P) is 
obtained by the following transformation. Consider a choice rule r in P: 

r:A^B(Z), choice{{X 1 ),{Y 1 )), choice((X k ), (Y k )). 

where, 

(i) B(Z) denotes the conjunction of all the goals of r that are not choice goals, 
and 

(ii) Xi, Yi, Z, 1 < i < k, denote vectors of variables occurring in the body of r 
such that Xi n Y t = and X it Y t C Z. 

Then, foe(P) is constructed from P as follows: 

1. Replace r with a rule r' obtained by substituting the choice goals with the 
atom chosen r (W): 

r' :A<- B(Z), chosen r (W). 
where W C Z is the list of all variables appearing in choice goals, i.e., W — 

2. Add the new rule 



chosen r (W) <- B(Z), ^dif fChoice r (W). 
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3. For each choice atom choice((Xi), (Yi)) (1 < i < k), add the new rule 

diffChoice r (W) <- chosen r {W), Y l ^ Y{. 

where (i) the list of variables W is derived from W by replacing each A ^ Xi 
with a new variable A' (i.e., by priming those variables), and (ii) Yi ^ Y[ 
denotes the inequality of the vectors; i.e., Yi ^ Y( is true when for some 
variable A and its primed counterpart A' € Y{, A ^ A'. 



Monotonic Nondeterminism 

Theorem 2.1 

Let P be a positive program with choice rules. Then the following properties 
hold ( |Gianiiotti et ai, 1991 ): 



• foe(P) has one or more total stable models. 

• The chosen atoms in each stable model of foe(P) obey the FDs defined by 
the choice goals. 

Observe that the foe(P) of a program with choice does not have total well- 



founded models; in fact, for our Example 2.1, the well-founded model yields unde- 



fined values for advisors. Therefore, the choice construct can express nondeterminis- 
tic semantics, which can be also expressed by stable models, but not by well-founded 
models. On the other hand, the choice model avoids the exponential complexity 
which is normally encountered under stable model semantics. Indeed, the compu- 



tation of stable models is TV'P-hard (Schlipf, 1993), but the computation of choice 



models for positive programs can be performed in polynomial time with respect 
to the size of the database. This, basically, is due to the monotonic nature of the 
choice construct that yields a simple fixpoint computation for programs with choice 



( Giannotti et a/., 20011: ). Indeed, the use of choice rules in positive programs pre- 
serves their monotonic properties. A program P can be viewed as consisting of two 
separate components: an extensional component (i.e., the database facts), denoted 
edb(P), and an intensional one (i.e., the rules), denoted idb(P). Then, a positive 
choice program defines a monotonic multi- valued mapping from edb(P) to idb(P), 



as per the following theorem proven in (Giannotti et ai, 2001b) 



Theorem 2.2 

Let P and P' be two positive choice programs where idb(P') = idb(P) and edb(P') D 
edb(P). Then, if M is a choice model for P, then, there exists a choice model M' 
for P' such that W D M. 

Two concrete semantics are possible for choice programs: one is an all-answers 
semantics, and the other is the semantics under which any answer will do — don't 
care nondeterminism. While an all-answers semantics for choice is not without 



interesting applications (Greco & Sacca, 1997), the single-answer semantics was 



adopted by CDC++, because this is effective at supporting DB-PTime problems 



(Abiteboul et ai, 1995). Then, we see that Theorem 2 allows us to compute results 



incrementally as it is done in differential fixpoint computations; in fact, to find an 
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answer, a program with choice can be implemented as an ordinary program, where 
the choice predicates are memorized in a table; then newly derived atoms that 
violate the choice FDs are simply discarded, much in the same way as duplicate 
atoms are discarded during a fixpoint computation. Thus positive choice programs 
represent a class of logic programs that are very well-behaved from both a semantic 
and a computational viewpoint. The same can be said for choice programs with 
stratified negation that are defined next. 

Definition 2.2 

Let P be a choice program with negated goals. Then, P is said to be stratified when 
the program obtained from P by removing its choice goals is stratified. 

The stable models for a stratified choice program P can be computed using 
an iterated choice fixpoint procedure that directly extends the iterated fixpoint 
procedure for programs with stratified negation ( Przymusinski, 1988| ; Zaniolo et al.\ 



1997 ); this is summarized next. Let Pi, denote the rules of P (whose head is) in 
stratum i, and let Pi* be the union of Pj, j < i. Now, if M{ is a stable model 
for P^, then every stable model for Mi U Pj+i is a stable model for the program 
P* + i- Therefore, the stable models of stratified choice programs can be computed 
by modifying the iterated fixpoint procedure used for stratified programs so that 
choice models (rather than the least models) are computed for strata containing 
choice rules ( Giannotti et al, f998| ). 



The Power of Choice 

The expressive power of choice was studied in ( Giannotti et at, 200fb| ), where it 
was shown that stratified Datalog with choice can express all computations that are 



polynomial in the size of the database (i.e., DB-PTIME queries ( Abiteboul et al. 



1995 )). Without choice, DB-PTIME cannot be expressed in stratified Datalog, un- 
less a predefined total order is assumed for the universe, an assumption that would 
violate the genericity principle (Abiteboul et al., f995). In terms of computational 
power, non-determinism and order fulfill a similar function ( Abiteboul et al., 1 9 9 5| ) : 
in fact, the application of choice can also be viewed as non-deterministically and 
incrementally generating a possible order on the universe — an order that is made 
explicit by the predicate chain discussed in Example 2.4. 

Before moving to Example 2.4, however, we would like to observe that the ver- 
sion of choice supported in CD LA — h is more powerful than other nondeterministic 
constructs, such as the witness operator ( Abiteboul et al, f995| ), and an earlier 
version of choice proposed in (Krishnamurthy & Naqvi, 1998) (called static choice 
in ( Giannotti et al., 2001h| )). For instance, the following query cannot be expressed 
in standard Datalog (since the query is nondeterministic) nor it can be expressed 
by the early version of choice (Krishnamurthy & Naqvi, f998) or by the witness 
construct ( Abiteboul et al, 199q ) . These early constructs express nondeterminism 
in nonrecursive programs, but suffer from inadequate expressive power in recursive 
programs (Giannotti et al., 200fb). In particular, they cannot express the query in 
Example |2.3|. 
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Example 2.3 

Rooted spanning tree. We are given an undirected graph where an edge joining two 
nodes, say x and y, is represented by the pair g(x, y) and g(y, x). Then, a spanning 
tree in this graph, starting from the source node a, can be constructed by the 
following program: 

st(root, a). 

st(X, Y) «- st(_, X), g(X, Y), Y ^ a, Y ^ X, 
choice((Y), (X)). 

To illustrate the presence of multiple total choice models for this program, take a 
simple graph consisting of the following arcs: 

g(a,b). g(b,a). 
g(b,c). g(c,b). 
g(a,c). g(c,a). 

After the exit rule adds st(root,a), the recursive rule could add st(a, b) and 
st (a, c), along with the two tuples chosen(a, b) and chosen(a, c) in the chosen 
table. No further arc can be added after those, since the addition of st(b, c) or 
st(c,b) would violate the FD that follows from choice((Y), (X)) enforced through 
the chosen table. However, since st(root, a) was produced by the first rule (the 
exit rule), rather than the second rule (the recursive choice rule), the table chosen 
contains no tuple with second argument equal to the source node a. Therefore, to 
avoid the addition of st(c, a) or st(b, a), the goal Y ^ a was added to the recursive 
rule. 

By examining all possible solutions, we conclude that this program has three 
different choice models, for which we list only the st-atoms, below: 

1. st(a,b), st(b, c). 

2. st(a, b), st(a, c). 

3. st(a, c), st(c,b). 

In addition to supporting nondeterministic queries, the introduction of the choice 
extends the power of Datalog for deterministic queries. This can be illustrated by 
the following choice program that places the elements of a relation d(Y) into a chain, 
thus establishing a random total order on these elements; then checks if the last 
element in the chain is even. 

Example 2.4 

The odd parity query by arranging the elements of a set in a chain. The elements 
of the set are stored by means of facts of the form d(Y) . 
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chain(nil, nil). 

chain(X, Y) «— chain(_,X), d(Y), 

choice((X), (Y)), choice((Y), (X)). 
odd(X) «- chain(nil,X). 
odd(Z) <- odd(X), chain(X, Y), chain(Y, Z). 

isodd^- odd(X),-.chain(X,Y). 



Here chain(nil, nil) is the root of a chain linking all the elements of d(Y) — thus 
inducing a total order on elements of d. 

The negated goal in the last rule defines the last element in the chain. Observe 
that the final isodd answer does not depend on the particular chain constructed; it 
only depends on its length that is equal to the cardinality of the set. Thus stratified 
Datalog with choice can express deterministic queries, such as the parity query, that 



cannot be expressed in stratified Datalog without choice ( Abiteboul et al., 1995 ) . 

The parity query cannot be expressed in Datalog with stratified negation unless 
we assume that the underlying universe is totally ordered — an assumption that 



violates the data independence principle of genericity (Abiteboul et ai, 1995). The 



benefits of this added expressive power in real-life applications follows from the 



fact that the chain program used in Example 2.4, above, to compute the odd parity 
query can be used to stream through the elements of a set one by one, and compute 
arbitrary aggregates on them. For instance, to count the cardinality of the set d(Y) 



we can write: 



mcount(X, 1) «— chain(nil, X). 

mcount(Y, Jl) <— mcount(X, J), chain(X, Y), Jl = J 

count(j) <— mcount (X, J), -ichain(X, Y). 



The negated goal in the last rule qualifies the element (s) X without a successor in 
the chain, i.e., X for which -ichain(X, Y) holds for all Ys. Therefore, count is defined 
by a program containing (and stratified with respect to) negation; thus, if count 
is then used as a builtin aggregate, the stratification requirement must be enforced 
upon every program that uses count. 

However, if we seek to determine if the base relation d(Y) has more than 14 
elements, then we can use the mcount aggregate instead of count, as follows: 

morethanl4 <— mcount(_, J), J > 14. 



Now, mcount is what is commonly known as an online aggregate (Hellerstein 



et al, 1997): i.e., an aggregate that produces early returns rather than final returns 



as traditional aggregates. The use of mcount over count offers clear performance 
benefits; in fact, the computation of morethanl4 can be terminated after 14 items, 
whereas the application of count requires visiting all the items in the chain. From 
a logical viewpoint, the benefits are even greater, since count is no longer needed 
and the rule defining it can be eliminated — leaving us with the program defining 
mcount, which is free of negation. Thus, no restriction is needed when using mcount 
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in recursive programs; and indeed, mcount (and morethanl4) define monotonic 
mappings in the lattice of set-containment. 

In summary, the use of choice led us to (i) a simple and general definition of the 
concept of aggregates, including user defined aggregates (UDAs), and (ii) the iden- 
tification of a special subclass of UDAs that are free from the yoke of stratification, 
because they are monotonic. This topic is further discussed in the next section. 



2.2 User Defined Aggregates 



The importance of aggregates in deductive databases has been recognized for a 
long time (Ross & Sagiv, 1997; Van Gclder, 1993; Kemp et at, 1998| ). In partic- 
ular, there have been several attempts to overcome the limitations placed on the 



use of aggregates in programs because of their nonmonotonic nature (Finkelstcin 



1996). Of particular interest is the work presented in (Ross & Sagiv, 1997), where 
it shown that rules with aggregates often define monotonic mappings in special 
lattices — i.e., in lattices different from the standard set-containment lattice used 
for Tp. Furthermore, programs with such monotonic aggregates can express many 



interesting applications ( Ross fc Sagiv, 1997 ). Unfortunately, the lattice that makes 
the aggregate rules of a given program monotonic is very difficult to identify au- 



tomatically ( |Van Gclder, 1993| ); this problem prevents the deployment of such a 
notion of monotonicity in real deductive database systems. 

A new wave of decision support applications has recently underscored the im- 



portance of aggregates and the need for a wide range of new aggregates (Han 



Kamber, 2001). Examples include rollups and datacubes for OLAP applications, 
running aggregates and window aggregates in time series analysis, and special ver- 
sions of standard aggregates used to construct classifiers or association rules in 



datamining (Han & Kamber, 2001). Furthermore, a new form of aggregation, called 



online aggregation, finds many uses in data- intensive applications (Hellerstein et al 



1997). To better serve this wide new assortment of applications requiring specialized 
aggregates, a deductive database system should support User Defined Aggregates 
(UDAs). Therefore, the new CDC++ system supports powerful UDAs, including 
online aggregates and monotonic aggregates, in a simple rule-based framework built 
on formal logic-based semantics. 

In CT>£++ users can define a new aggregate by writing the single, multi, 
and f return rules (however, ereturn rules can be used to supplement or replace 
freturn rules). The single rule defines the computation for the first element of 
the set (for instance mcount has its second argument set to 1), while multi defines 
the induction step whereby the value of the aggregate on a set of n + 1 elements is 
derived from the aggregate value of the previous set with n elements and the value 
of (n + l) th element itself. A unique aggregate name is used as the first argument 
in the head of these rules to eliminate any interference between the rules defining 
different aggregates. For instance, for computing averages we must compute both 
the count and the sum of the elements seen so far: 
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single(avg, Y, cs(l, Y)). 

multi(avg, Y, cs(Cnt, Sum), cs(Cntl, Suml)) <— 

Cntl = Cnt + 1, Suml = Sum + Y. 

Then, we write a f return rule that upon visiting the final element in d(Y) pro- 
duces the ratio of sum over count, as follows: 



f return(avg, Y, cs(Cnt, Sum), Val) <— Val = Sum/Cnt. 

After an aggregate is defined by its single, multi, ereturn and/or freturn 
rules, it can be invoked and used in other rules. For instance, our the newly defined 
avg can be invoked as follows: 



p(avg(Y»^- d(Y). 

Thus CVC++ uses the special notation of pointed brackets, in the head of rules, 
to denote the application of an aggregate. This syntax, that has been adopted by 



other languages ( Ramakrishnan et al, 1993), also supports an implicit 'group by' 



construct, whereby the aggregate arguments in the head are implicitly grouped by 
the other arguments in the head. Thus, to find the average salary of employees 
grouped by department a user can write the following rule: 

davg(DeptNo, avg (Sal)) <— employee(Eno, Sal, DeptNo). 



The formal semantics of UDAs was introduced in (Zaniolo & Wang, 1999) and 
is described in the Appendix: basically, the aggregate invocation rules and the 
aggregate definition rules are rewritten into an equivalent program that calls on the 
chain predicate defined as in Example |2.4| . (Naturally, for the sake of efficiency, the 
£DC-\ — h system shortcuts the full rewriting used to define their formal semantics, 
and implement the UDAs by a more direct implementation.) 



CD C++ UDAs have also been extended to support online aggregation (Heller 



stein et al., 1997). This is achieved by using ereturn rules in the definition of UDAs, 
to either supplement, or replace freturn rules. 

For example, the computation of averages normally produces an approximate 
value long before the whole data set is visited. Then, we might want to see the 
average value obtained so far every 100 elements. Then, the following rule will be 
added: 



ereturn(avg, X, (Sum, Count), Avg) <— 

Count mod 100 = 0, Avg = Sum/Count. 

Thus the ereturn rules produce early returns, while the freturn rules produce 
final returns. 

As second example, let us consider the well-known problem of coalescing after 
temporal projection in temporal databases (Zaniolo et al., 1997). For instance in 
Example 5, below, after projecting out from the employee relation the salary col- 
umn, we might have a situation where the same Eno appears in tuples where their 
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valid-time intervals overlap; then these intervals must be coalesced. Here, we use 
closed intervals represented by the pair (From, To) where From is the start-time, 
and To is the end-time. Under the assumption that tuples are sorted by increasing 
start-time, then we can use a special coales aggregate to perform the task in one 
pass through the data. 

Example 2.5 

Coalescing overlapping intervals sorted by start time. 

empProj (Eno, coales((From, To))) «— emp(Eno, _, _, (From, To)). 



single(coales, (Frm, To), (Frm, To)). 
multi(coales, (Nf r, Nto), (Cf r, Cto), (Cf r, Nto)) <- 

Nf r <= Cto, Nto > Cto. 
multi(coales, (Nf r, Nto), (Cf r, Cto), (Cf r, Cto)) <- 

Nf r <= Cto, Nto <= Cto. 
multi(coales, (Nf r, Nto), (Cf r, Cto), (Nf r, Nto)) <- Cto < Nf r. 



ereturn(coales, (Nf r, Nto), (Cf r, Cto), (Cf r, Cto)) <— Cto < Nf r. 
f return(coales, _, Lastlnt, Lastlnt). 

Since the input intervals are ordered by their start time, the new interval (Nf r, Nto) 
overlaps the current interval (Cf r, Cto) when Nf r < Cto; in this situation, the two 
intervals are merged into one that begins at Cf r and ends with the larger of Nto 
and Cto. When, the new interval does not overlap with the current interval, this is 
returned by the ereturn rule, while the new interval becomes the current one (see 
the last multi rule). 

Let P be a program. A rule r of P whose head contains aggregates is called 
an aggregate rule. Then, P is said to be stratified w.r.t. aggregates when for each 
aggregate rule r in P, the stratum of r's head predicate is strictly higher than 
the stratum of each predicate in the head of r. Therefore, the previous program is 
stratified with respect to coales which is nonmonotonic since it uses both early 
returns and final returns. 

While, programs stratified with respect to aggregates can be used in many appli- 
cations, more advanced applications require the use of aggregates in more general 
settings. Thus, CD C++ supports the usage of arbitrary aggregates in XY-stratified 
programs, which will be discussed in Section 3. Furthermore CD C++ supports the 
monotonic aggregates that can be used freely in recursion. 



Monotone Aggregation 

An important result that follows from the formalization of the semantics of UDAs 
( ^aniolo fc Wang, 1999 ) (see also Appendix), is that UDA defined without final 
return rules, i.e., no freturn rule, define monotonic mappings, and can thus be 
used without restrictions in the definition of recursive predicates. For instance, we 



12 



F. Ami and others 



will next define a continuous count that returns the current count after each new 
element (thus final returns are here omitted since they are redundant). 

single(mcount, Y, 1). 

multi(mcount, Y, Old, New) <— New = Old + 1. 
ereturn(mcount, Y, Old, New) <— New = Old + 1. 

Monotonic aggregates allow us to express the following two examples taken from 
( [Ross fc Sagiv, 19971 ). 



Join the Party Some people will come to the party no matter what, and their names 
are stored in a sure(Person) relation. But others will join only after they know 
that at least K = 3 of their friends will be there. Here, friend(P,F) denotes that 
F is a friend of person P. 

willcome(P) <— sure(P). 
willcome(P) <— c_f riends(P, K), K > 3. 

c_f riends(P, mcount (F)) <— willcome(F), f riend(P, F). 

Consider now a computation of these rules on the following database. 

friend(jerry,mark). sure(mark). 

f riend(penny, mark). sure(tom). 

f riend(jerry, jane). sure(jane). 
f riend(penny, jane). 
friend( jerry, penny), 
f riend(penny, torn). 

Then, the basic semi-naive computation yields: 

willcome(mark), willcome(tom), willcome(jane), 

c_friends( jerry, 1), cjfriends (penny, 1), cj£riends( jerry, 2), 
c_f riends(penny, 2), cjfriends(penny, 3), willcome(penny), 

c_f riends(jerry, 3), willcome(jerry). 

This example illustrates how the standard semi-naive computation can be applied 
to queries containing monotone UDAs. Another interesting example is transitive 
ownership and control of corporations. 

Company Control Say that owns(Cl, C2, Per) denotes the percentage of shares that 
corporation CI owns of corporation C2. Then, CI controls C2 if it owns more than, 
say, 50% of its shares. In general, to decide whether CI controls C3 we must also add 
the shares owned by corporations, such as C2, that are controlled by CI. This yields 
the transitive control rules defined with the help of a continuous sum aggregate 
that returns the partial sum for each new element: 
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control(C, C) <— owns(C, _). 

control(Onr, C) <— towns(Dnr, C, Per), Per > 50. 

towns(0nr, C2, msum(Per)) <— control(Dnr, CI), owns(Cl, C2, Per). 



single(msum, Y, Y). 

multi(msum, Y, Old, New) <— New = Old + Y. 
ereturn(msum, Y, Old, New) <— New = Old + Y. 

Thus, every company controls itself, and a company CI that has transitive own- 
ership of more than 50% of C2's shares controls C2. In the last rule, towns computes 
transitive ownership with the help of msum that adds up the shares of controlling 
companies. Observe that any pair (Onr, C2) is added at most once to control, thus 
the contribution of CI to Onr's transitive ownership of C2 is only accounted once. 



Bill- of- Materials (BoM) Applications BoM applications represent an important 
application area that requires aggregates in recursive rules. For instance, let us say 
that assembly(Pl, P2, QT) denotes that PI contains part P2 in quantity QT. We also 
have elementary parts described by the relation basic_part(Part, Price). Then, 
the following program computes the cost of a part as the sum of the cost of the 
basic parts it contains: 



part_cost(Part, 0, Cst) <— basic_part(Part, Cst). 

part_cost(Part, mcount (Sb), msum(MCst)) <— 

part_cost(Sb, ChC, Cst), prolf c(Sb, ChC), 
assembly (Part, Sb, Mult), MCst = Cst * Mult. 

Thus, the key condition in the body of the second rule is that a subpart Sb is 
counted in part_cost only when all Sb's children have been counted. This occurs 
when the number of Sb's children counted so far by mcount is equal to the out- 
degree of this node in the graph representing assembly. This number is kept in the 
prolificacy table, prolf c (Part, ChC), which can be computed as follows: 



prolf c(Pl, count(P2)) «- assembly(Pl, P2, _). 
prolf c(Pl, 0) <— basic_part(Pl,_). 

Therefore the simple and general solution of the monotonic aggregation problem 
introduced by CD C++ allows the concise expression of many interesting algorithms. 
This concept can also be extended easily to SQL recursive queries, as discussed in 
(Wang & Zaniolo, 200C) where additional applications are also discussed. 



2.3 Beyond Stratification 



The need to go beyond stratification has motivated much recent research. Several 
deductive database systems have addressed it by supporting the notion of modular 
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stratification (Ross, 1994). Unfortunately, this approach suffers from poor usabil- 
ity, since the existence of a modular stratification for a program can depend on 
its extensional information (i.e., its fact base) and, in general, cannot be checked 
without executing the program. The standard notion of stratification is instead 
much easier to use, since it provides a simple criterion for the programmer to fol- 
low and for the compiler to use when validating the program and optimizing its 
execution. Therefore, CVC++ has introduced the notion of XY-stratified programs 
that preserves the compilability and usability benefits of stratified programs while 



achieving the expressive power of well-founded models ( |Kcmp et al., 1995|) . XY- 
stratified programs are locally stratified explicitly by a temporal argument: thus, 
they can be viewed as Datalogis programs, which are known to provide a pow- 



erful tool for temporal reasoning (Baudinet et al, 1994; Zaniolo et al., 1997), or 



as Statelog programs that were used to model active databases ( Lauscn, 1998b| ) 



The deductive database system Aditi ( Kemp et al, 1998 ) also supports the closely 



related concept of explicitly locally stratified programs, which were shown to be as 
powerful as well-founded models, since they can express their alternating fixpoint 



computation (Kemp et al, 1995) 



For instance, the ancestors of marc, with the number of generations that separate 
them from marc, can be computed using the following program which models the 
differential fixpoint computation: 

Example 2.6 

Computing ancestors of Marc and their remoteness from Marc using differential 
fixpoint approach. 

ri : delta_anc(0, marc). 

r 2 : delta_anc(J + 1,Y) <— delta_anc(J, X), parent(Y, X), 

^all_anc(J, Y). 
r 3 : all_anc(J + 1, X) «- all_anc(J, X). 
r4 : all_anc(J,X) <— delta_anc(J, X). 

This program is locally stratified by the first arguments in delta_anc and all_anc 
that serve as temporal arguments (thus +1 is a postfix successor function sym- 



bol, much the same as s(J) that denotes the successor of J in Datalogis (Zaniolo 



et al, 1997)). The zero stratum consists of atoms of nonrecursive predicates such 
as parent and of atoms that unify with all_anc(0,X) or delta_anc(0, X). The k th 
stratum consists of atoms of the form all_anc(k, X), delta_anc(k, X). Thus, the pre- 
vious program is locally stratified flPrzymusinski, 1988 ) , since the heads of recursive 



rules belong to strata that are one above those of their goals. Alternatively, we can 
view the previous program as a compact representation for the stratified program 
obtained by instantiating the temporal argument to integers and attaching them 
to the predicate names, thus generating an infinite sequence of unique names. 

Also observe that the temporal arguments in rules are either the same as, or one 
less than, the temporal argument in the head. Then, there are two kinds of rules in 
our example: (i) X-rules (i.e., a horizontal rules) where the temporal argument in 
each of their goals is the same as that in their heads, and (ii) Y-rules (i.e., a vertical 
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rules) where the temporal arguments in some of their goals are one less than those in 
their heads. Formally, let P be a set of rules defining mutually recursive predicates, 
where each recursive predicate has a distinguished temporal argument and every 
rule in P is either an X-rule or a Y-rule. Then, P will be said to be an XY-program. 



For instance, the program in Example 2.6 is an XY-program, where r^ and r\ are 
X-rules, while r-i and r% are Y-rules. 

A simple test can now be used to decide whether an XY-program P is locally 
stratified. The test begins by labelling all the head predicates in P with the prefix 
'new'. Then, the body predicates with the same temporal argument as the head 
are also labelled with the prefix 'new', while the others are labelled with the prefix 
'old'. Finally, the temporal arguments are dropped from the program. The resulting 
program is called the bistate version of P and is denoted Pbi S ■ 

Example 2.7 



The bistate version of the program in Example 2.6 



new_delta_anc(marc). 

new_delta_anc(Y) <— old_delta_anc(X), parent(Y, X), 

-.old_all_anc(Y). 
new_all_anc(X) <— new_delta_anc(X). 
new_all_anc(X) <— old_all_anc(X). 



Now we have that ( |Zaniolo et at, 199S ) 



Definition 2.3 

Let P be an XY-program. P is said to be XY-stratified when P\,i S is a stratified 
program. 



Theorem 2.3 

Let P be an XY-stratified program. Then P is locally stratified. 

The program of Example |2.7j is stratified with the following strata: So = {parent , 
old_all_anc, old_delta_anc}, Si = {new_delta_anc}, and S 2 = {new_all_anc}. 
Thus, the program in Example [2.6| is locally stratified. 



For an XY-stratified program P, the general iterated fixpoint procedure (Przy 
|musinski, 1988) used to compute the stable model of locally stratified programs 



( Zaniolo et ai, 1993 ) becomes quite simple; basically it reduces to a repeated com- 
putation over the stratified program Pu s - For instance, for Example [2.7| we com- 
pute new_delta_anc from old_delta_anc and then new_all_anc from this. Then, 
the 'old' relations are re-initialized with the content of the 'new' ones so derived, 
and the process is repeated. Furthermore, since the temporal arguments have been 
removed from this program, we need to 

1. store the temporal argument as an external fact counter(T), 

2. add a new goal counter (I r ) to each exit rule r in Pu s , where I r is the variable 
from the temporal arguments of the original rule r, and 
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3. For each recursive predicate q add the rule: 

q(J,X)<— new_q(X), counter(j). 

The program so constructed will be called the synchronized bistate version of P, 
denoted syncbi(P) . For instance, to obtain the synchronized version of the program 
in Example |2.7[ we need to change the first rule to 



new_delta_anc(marc) <— counter(O). 

since the temporal argument in the original exit rule was the constant 0. Then, we 
must add the following rules: 



delta_anc(J, X) <— new_delta_anc(X), counter(j). 
all_anc(J,X) <— new_all_anc(X), counter(j). 

Then, the iterated fixpoint computation for an XY-stratified program can be 
implemented by the following procedure: 

Procedure 2.1 

Computing a stable model of an XY-stratified program P: Add the fact counter (0). 
Then, forever repeat the following two steps: 

1. Compute the stable model of syncbi(P). 

2. For each recursive predicate q, replace old_q with new_q, computed in the 
previous step. Then, increase the value of counter by one. 

Since syncbi(P) is stratified, we can then use the iterated fixpoint computation 
to compute its stable model. 



Since each XY-stratified program is locally stratified (Przymusinski, 1988), it is 



guaranteed to have a unique stable model, which is also known as its perfect model 



( Przymusinski, 198£ ) . But the special syntactic structure of XY-stratified programs 
allows an efficient computation of their perfect models using Procedure 4; more- 
over, in the actual CDC^ — h implementation, this computation is further improved 
with the optimization techniques discussed next. For instance, the replacement of 



old_q with new_q described in the last step of Procedure 2.1 becomes an operation 
of (small) constant cost when it is implemented by switching the pointers to the 
relations. A second improvement concerns copy rules, such as the last rule in Ex- 
ample |2.6| . For instance r^ in Example 6 is a copy rule that copies the new values 
of all_anc from its old values. Observe that the body and the head of this rule 
are identical, except for the prefixes new or old, in its bistate version (Example 
|2.7D - Thus, in order to compute new_all_anc, we first execute the copy rule by sim- 
ply setting the pointer to new_all_anc to point to old_all_anc — a constant-time 
operation. Rule r$ that adds tuples to new_all_anc is then executed after r%. 
In writing XY-stratified programs, the user must also be concerned with ter- 



mination conditions, since e.g., a rule such as r% in Example 2.6 could, if left 
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unchecked, keep producing all_anc results under a new temporal argument, af- 
ter delta becomes empty. One solution to this problem is for the user to add the 
goal delta_anc(J, _) to rule 7-3. Then, the computation all_anc stops as soon as no 
new delta_anc(J, _) is generated. Alternatively, our program could be called from 
a goal such as delta_anc(J, Y). In this case, if r 2 fails to produce any result for a 
value J, no more results can be produced at successive steps, since delta_anc(J, Y) 
is a positive goal of r 2 . The CD C++ system is capable of recognizing these situa- 
tions, and it will terminate the computation of Procedure 2.1 when either condition 
occurs. 

Example 2^ solves the coalescing problem without relying on tuples being sorted 
on their start-time — an assumption made in Example 2.5. Therefore, we use the 
XY-stratified program of Example 2.8 , which iterates over two basic computation 
steps. The first step is defined by the overlap rule, which identifies pairs of dis- 
tinct intervals that overlap, where the first interval contains the start of the second 
interval. The second step consists of deriving a new interval that begins at the 
start of the first interval, and ends at the later of the two endpoints. Finally, a 
rule f inal_e_hist returns the intervals that do not overlap other intervals (after 
eliminating the temporal argument). 



Example 2.8 

Coalescing overlapping periods into maximal periods after a projection 

e_tiist(0, Eno, Frm, To) «— emp_dep_sal(0, Eno, _, _, Frm, To). 

overlap(J+ 1, Eno, Frml, Tol, Frm2, To2) «- 

e_hist(J, Eno, Frml, Tol), 
e_hist(J, Eno, Frm2, To2), 
Frml < Frm2,Frm2 < Tol, 
distinct(Frml, Tol, Frm2, To2). 

e_hist(J, Eno, Frml, To) <— overlap(J, Eno, Frml, Tol, Frm2, To2), 

select_larger(Tol, To2, To). 



f inal_e_hist(J + 1, Eno, Frm, To) <— e_hist(J, Eno, Frm, To), 

^overlap(J + 1, Eno, Frm, To,_, _). 

distinct (Frml, Tol, Frm2, To2) <— Tol ^ To2. 
distinct (Frml, Tol, Frm2, To2) <— Frml ^ Frm2. 
select_larger(X, Y,X) <- X > Y. 
select_larger(X, Y, Y) <— Y > X. 

As demonstrated by these examples, XY-stratified programs allow an efficient 
logic-based expression of procedural algorithms. For instance, the alternating fix- 
point procedure used in the computation of well-founded models can also be ex- 



pressed using these programs (Kemp et al, 1995). In general, XY-stratified pro- 



grams are quite powerful, as demonstrated by fact that these programs (without 
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choice, aggregates, and function symbol) are known to be equivalent to Statelog pro- 
grams (Lausen et al., 1998a) , which have PSPACE complexity and can express the 
while queries (Abiteboul et al., 1995). Finally, observe that the bistate programs 
for the examples used here are nonrecursive. In general, by making the computation 
of the recursive predicate explicit as it was done for the anc example, it is possible 
to rewrite an XY-stratihed program P whose bistate version Pf,j S is recursive into 
an XY-stratified program P' whose bistate version P^ is is nonrecursive. 



Choice and Aggregates in XY-stratified Programs 



As described in Section 2.1, choice can be used in stratified programs with no re- 
striction, and its stable model can be computed by an iterated choice fixpoint pro- 
cedure. Generalizing such notion, the CD C++ system supports the use of choice in 
programs that are XY-stratified with respect to negation. The following conditions 
are however enforced to assure the existence of stable models for a given program 



P (Giannotti et al, 1998) 



• The program obtained from P by removing its choice goals is XY-stratified 
w.r.t. negation, and 

• If r is a recursive choice rule in P, then some choice goal of r contains r's 
temporal variable in its left side. 

After checking these conditions, the CD C++ compiler constructs syncbi(P) by 
dropping the temporal variable from the choice goals and transforming the rest 
of the rules as described in the previous section. Then, the program syncbi(P) 
so obtained is a stratified choice program and its stable models can be computed 
accordingly; therefore, each stable model for the original XY-stratified program P 



is computed by simply applying Procedure 2T with no modification ( Zaniolo et al 



1997| ; |Giannotti et al., 1998|) 



Using the simple syntactic characterization given in Section 2.2, CDC-\ — h draws 
a sharp distinction between monotonic and nonmonotonic aggregates. No restric- 
tion is imposed on programs with only monotonic aggregates and no negation. 
But recursive programs with nonmonotonic aggregates must satisfy the following 
conditions (which assure that once the aggregates are expanded as described in 
Section |2.2| the resulting choice program satisfies the XY-stratification conditions 
for choice programs discussed in the previous paragraph): 

• For each recursive rule, the temporal variable must be contained in the group- 
by attributes. 

• The bistate version of P must be stratified w.r.t. negation and nonmonotonic 
aggregates, and 

After checking these simple conditions, the CDC++ compiler proceeds with the 
usual computation of syncbi(P) as previously described. 

For instance, the following XY-stratified program with aggregates expresses Floyd's 
algorithm to compute the least-cost path between pairs of nodes in a graph. Here, 
g(X, Y, C) denotes an arc from X to Y of cost C: 
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Example 2.9 

Floyd's least-cost paths between all node pairs. 



delta(0,X, Y,C) <- 
new(J+ 1,X,Z,C) «- 
new(J+ 1,X,Z,C) <- 
newmin( J, X, Z, min(C}) 
discard(J,X,Z, C) <— 
delta(J,X, Z, C) <- 
all(J+ 1,X,Z,C) <- 
all(J,X, Z,C) <- 



g(X,Y,C). 

delta(J, X, Y, CI), all(J, Y, Z, C2), C = CI + 
all(J, X, Y, CI), delta(J, Y, Z, C2), C = CI + 
«- new(J,X, Z, C). 

newmin(J, X, Z, CI), all(J, X, Z, C2), CI > C2 
newmin(J, X, X, C), -idiscard(J, X, Z, ). 
all(J, X, Z, C), -idelta( J + 1, X, Z, _). 
delta(J,X, Z,C). 



C2. 
C2. 



The fourth rule in this example uses a nonmonotonic min aggregate to select the 
least cost pairs among those just generated (observe that the temporal variable J 
appears among the group-by attributes). The next two rules derive the new delta 
pairs by discarding from new those that are larger than any existing pair in all. 
This new delta is then used to update all and compute new pairs. 

By supporting UDAs, choice, and XY-stratihcation CD C++ provides a powerful, 
fully integrated framework for expressing logic-based computation and modelling. 
In addition to express complex computations ( Zaniolo et a/., 1998| ) , this power has 
been used to model the AI planning problem ( Brogi et at, 1997 ), database updates, 
and active database rules ( Zaniolo, 1997 ). For instance, to model AI planning, 
preconditions can simply be expressed by rules, choice can be used to select among 
applicable actions, and frame axioms can be expressed by XY-stratified rules that 
describe changes from the old state to the new state (Brogi et al, 1997). 



3 The System 

The main objectives in the design of the CD C++ system, were (i) strengthening 
the architecture of the previous CDC system ( Chimenti et at, 1990| ) , (ii) improving 



the system's usability and the application development turnaround time, and (iii) 
provide efficient support for the new language constructs. 

While the first objective could be achieved by building on and extending the gen- 
eral architecture of the predecessor CDC system, the second objective forced us to 
depart significantly from the compilation and execution approach used by the CDC 
system. In fact, the old system adhered closely to the set-oriented semantics of rela- 
tional algebra and relational databases; therefore, it computed and accumulated all 
partial results before returning the whole set to the user. However, our experience 
in developing applications indicated that a more interactive and incremental com- 
putation model was preferable: i.e., one where users see the results incrementally 
as they are produced. This allows developers to monitor better the computation as 
it progresses, helping them debugging their programs, and, e.g., allowing them to 
stop promptly executions that have fallen into infinite loops. 

Therefore, CD C++ uses a pipelined execution model, whereby tuples are gener- 
ated one at a time as they are needed (i.e., lazily as the consumer requests them, 
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rather than eagerly). This approach also realizes objective (iii) by providing bet- 
ter support for new constructs, such as choice and on-line aggregation, and for 
intelligent backtracking optimization (discussed in the next section) . 

The CD C++ system also adopted a shallow-compilation approach that achieves 
faster turnaround during program development and enhances the overall usability; 
this approach also made it easier to support on-line debugging and meta-level ex- 
tensions. The previous CDC system was instead optimized for performance; thus, 
it used a deep-compilation approach where the original program was translated 
into a (large) C program — whose compilation and linking slowed the development 
turnaround time. The architecture of the system is summarized in the next section; 
additional information, a web demo, and instructions on downloading for noncom- 
mercial use can be found in (Zaniolo et al., 1998). 



3.1 Architecture 

The overall architecture of the CDC++ system and its main components are shown 
in Figure 1. The major components of the system are: 



The Compiler The compiler reads in CT>C-\ — h programs and constructs the Global 
Predicate Connection Graph (PCG). For each query form, the compiler partially 
evaluates the PCG, transforming it into a network of objects that are executed by 



the interpreter. The compiler is basically similar to that of the old system (Chimcnt 



et at, 1990), and is responsible for checking the safety of queries, and rewriting the 



recursive rules using techniques such the Magic Sets method (Bancilhon et al. 



1986), and the more specialized methods for left-linear and right-linear rules (Ull 
man, 1989). These rewriting techniques result in an efficient execution plan for 
queries. 



The Database Managers The CDC experience confirmed the desirability support- 
ing access to (i) an internal (fast-path) database and (ii) multiple external DBMSs 
in a transparent fashion. This led to the design of a new system where the two 
types of database managers are fully integrated. 

The internal database is shown in Figure [l] as Fact Base Manager. This module 
supports the management and retrieval of CDC-\ — h complex objects, including sets 
and lists, and of temporary relations obtained during the computation. In addition 
to supporting users' data defined by the schema as internal relations, the inter- 
preter relies on the local database to store and manage temporary data sets. The 
internal database is designed as a virtual-memory record manager: thus its internal 
organization and indexing schemes are optimized for the situation where the pages 
containing frequently used data can reside in main memory. Data is written back 
onto disk at the commit point of each update transaction; when the transaction 
aborts the old data is instead restored from disk. 

The system also supports an external database manager, which is designed to 
optimize access to external SQL databases; this is described in Section [O. 
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Fig. 1. CD C++ Open Architecture 



Interpreter The interpreter receives as input a graph of executable objects corre- 
sponding to an CD C++ query form generated by the compiler, and executes it 
by issuing get-next, and other calls, to the local database. Similar calls are also 
issued by the External Database Manager and the External Predicate Manager to, 
respectively, external databases, and external functions or software packages that 
follow the C/C++ calling conventions. Details on the interpreter are presented in 
the next section. 



User Interface All applications written in C/C++ can call the CD C++ system via 
a standard API; thus applications written in CDC++ can be embedded in other 
procedural systems. 

One such application is a line-oriented command interpreter supporting a set of 
predefined user commands, command completion and on-line help. The command 
interpreter is supplied with the system, although it is not part of the core system. 
Basically, the interface is an application built in CH — h that can be replaced with 
other front-ends, including graphical ones based on a GUI, without requiring any 
changes to the internals of the system. In particular, a Java-based interface for 
remote demoing was added recently (Zaniolo et at, 1998). 
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3.2 Execution Model and Interpreter 

The abstract machine for the CDC++ interpreter is based upon the architecture 
described in ( Chimcnti et a/., 1989| ). An CD C++ program is transformed into a 
network of active objects and the graph-based interpreter then processes these 
objects. 



Code generation and execution Given a query form, an £DC-\ — h program is trans- 
formed into a Predicate Connection Graph (PCG), which can be viewed as an 
AND/OR graph with annotations. An OR-node represents a predicate occurrence 
and each AND node represents the head of a rule. The PCG is subsequently com- 
piled into an evaluable data structure called a LAM (for CDC++ Abstract Ma- 
chine), whose nodes are implemented as instances of C++ classes. Arguments are 
passed from one node to the other by means of variables. Unification is done at 
compile time and the sharing of variables avoids useless assignments. 

Each node of the generated LAM structure has a virtual^] "Get Tuple" interface, 
which evaluates the corresponding predicate in the program. Each node also stores 
a state variable that determines whether this node is being "entered" or is being 
"backtracked" into. The implementation of this "Get Tuple" interface depends on 
the type of node. The most basic C++ classes are OR-nodes and AND-nodes; 
then there are several more specialized subclasses of these two basic types. Such 
subclasses include the special OR-node that serves as the distinguished root node 
for the query form, internal relations AND-nodes, external relations AND-nodes, 
etc. 



And/ OR Graph For a generic OR node corresponding to a derived relation, the 
"Get Tuple" interface merely issues "Get Tuple" calls to its children (AND nodes). 
Each successful invocation automatically instantiates the variables of both the child 
(AND node) and the parent (OR node). Upon backtracking, the last AND node 
which was successfully executed is executed again. The "GetTuple" on an OR node 
fails when its last AND node child fails. 

The Dataflow points represent different entries into the AND/OR nodes, each 
entry corresponding to a different state of the computation. The dataflow points 
associated with each node are shown in the following table (observe their similarity 
to ports in Byrd's Prolog execution model ( Byrd, 1980| )): 



2 Similar to a CH — h virtual function 
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DATAFLOW POINT STATE OF COMPUTATION 



entry eldest 

BACKTRACK bAest 

SUCCESS s_dest 

fail f_dest 



getting first tuple of node 
getting next tuple of node 
a tuple has been generated 
no more tuples can be generated 



A dataflow point of a node can be directed to a dataflow point of a different 
node by a dataflow destination. The entry destination (eldest) of a given node is 
the dataflow point to which its entry point is directed. Similarly backtrack (b.dest), 
success (s-dest), and fail destinations (f-dest) can be defined. The dataflow desti- 
nations represent logical operations between the nodes involved; for example a join 
or union of the two nodes. The dataflow points and destinations of a node describe 
how the tuples of that node are combined with tuples from other nodes (but not 
how those tuples are generated). 

To obtain the first tuple of an OR node we get the first tuple of its first child 
AND node. To obtain the next tuple from an OR node we request it from the AND 
node that generated the previous tuple. Observe that the currently "active" AND 
node must be determined at run-time. When no more tuples can be generated for 
a given AND node, then we go to the next AND node, till the last child AND node 
is reached (At this point no more tuples can be generated for the OR node). Thus, 
we have: 

OR nodes: eldest: the e_dest of the first child AND-node 

b_dest: the b_dest of the "active" child AND node 
f.dest: if node is first OR node in rule 

then the Ldest point of parent AND node 

else the b_dest of previous OR node 
S-dest: if node is last OR node in a rule 

then the s_dest of parent AND node 

else the e_dest of next OR node. 

The execution of an AND node is conceptually less complicated. Intuitively, the 
execution corresponds to a nested loop, where, for each tuple of the first OR node, 
we generate all matching tuples from the next OR node. This continues until we 
reach the last OR node. Thus, when generating the next tuple of an AND node, 
we generate the next matching tuple from the last OR node. If there are no more 
matching tuples, we generate the next tuple from the previous OR node. When 
there are no more tuples to be generated by the first OR node, we can generate no 
more tuples for the AND node. Thus we have: 

AND nodes: e_dest: the e_dest of first OR child 
b_dest: the b_dest of last OR child 
f_dest: if node is last AND child 

then Ldest of parent OR node 
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else e_ dest of next AND node 
S-dest: s_dest of parent OR node. 

Given a query, the CT>C-\ — h system first finds the appropriate LAM graph for 
the matching query form, then stores any constant being passed to the query form 
by initializing the variables attached to the root node of the LAM graph. Finally, 
the system begins the execution by repeatedly calling the "Get Tuple" method on 
the root of this graph. When the call fails the execution is complete. 



Lazy Evaluation of Fixpoints £T>C-\ — h adopts a lazy evaluation approach (pipelin- 
ing) as its primary execution model, which is naturally supported by the AND/OR 
graph described above. This model is also supported through the lazy evaluation of 
fixpoints. The traditional implementation of fixpoints ( Ullman, 1989| ; Zaniolo et al 



1997 ) assumes an eager computation where new tuples are generated till the fix- 



point is reached. CT>C-\ — h instead supports lazy computation where the recursive 
rules produce new tuples only in response to the goal that, as a consumer, calls 
the recursive predicate. Multiple consumers can be served by one producer, since 
each consumer j uses a separate cursor Cj to access the relation R written by the 
producer. Whenever j needs a new tuple, it proceeds as shown in Figure |[ 

A limitation of pipelining is that the internal state of each node must be kept 
for computation to resume where the last call left off. This creates a problem when 
several goals call the same predicate (i.e. the same subtree in the PCG is shared). 
Multiple invocations of a shared node can interfere with each other (non-reentrant 
code). Solutions to this problem include (i) using a stack as in Prolog, and (ii) 
duplicating the source code as in the CDC system — thus ensuring that the PCG 
is a tree, rather than a DAG (Chimenti et al., 1990). In the CDC++ system, we 
instead use the lazy producer approach described above for situations where the 
calling goals have no bound argument. If there are bound arguments in consuming 
predicates we duplicate the node. However, since each node is implemented as a 
C++ class, we simply generate multiple instances of this class — i.e., we duplicate 
the data but still share the code. 



Intelligent Backtracking Pipelining makes it easy to implement optimizations such 
as existential optimization and intelligent backtracking QChimcnti et al, 199C ). Take 
for instance the following example: 

Example 3.1 



Fig. 2. Lazy Fixpoint Producer 

Step 1. Move the cursor Cj to the next tuple of R, and consume the tuple. 
Step 2. If Step 1 fails (thus, Cj is the last tuple of R), check the fixpoint flag F. 
Step 3. If the fixpoint is reached, return failure. 

Step 4. If the fixpoint is not reached, call the current rule to generate a new tuple. 
Step 5. If a new tuple is generated, add it to the relation ii, advance Cj and return 
the tuple. 

Step 6. Otherwise, repeat Step 2. 
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Intelligent Backtracking. 

query3(A, B) «- bl(A), p(A, B), b2(A). 

Take the situation where the first A- value generated by bl is passed to p(A,B), 
which succeeds and passes the value of A to b2. If the first call to this third goal fails, 
there is no point in going back to p, since this can only return a new value for B. 
Instead, we have to jump back to bl for a new value of A. In an eager approach, all 
the B-values corresponding to each A are computed, even when they cannot satisfy 
b2. 

Similar optimizations were also supported in CDC flChimcnti et ai, 1990| ), but 
with various limitations: e.g., existential optimization was not applied to recursive 
predicates, since these were not pipelined. In CDC-\ — h, the techniques are applied 
uniformly, since pipelining is now used in the computation of all predicates, includ- 
ing recursive ones. 

3.3 External Databases 

A most useful feature of the CD C++ system is that it supports convenient and 
efficient access to external databases. As shown in Figure [j], the External Database 
Interface (EDI) provides the capability to interact with external databases. The 
system is equipped with a generic SQL interface as well as an object-oriented design 
that allows easy access to external database systems from different vendors. To link 
the system with a specific external database, it is only necessary to write a small 
amount of code to implement vendor-specific drivers to handle data conversion and 
local SQL dialects. The current CDC++ system can link directly with Sybase, 
Oracle, DB2, and indirectly with other databases via JDBC |. 

The rules in a program make no distinction between internal and external rela- 
tions. Relations from external SQL databases are declared in the CDC++ schema 
just like internal relations, with the additional specification of the type and the 
name of the SQL server holding the data. As a result, these external resources are 
transparent to the inference engine, and applications can access different databases 
without changes. The EDI can also access data stored in files. 

The following example shows the CDC++ schema declarations used to access 
an external relation employee in the database payroll running on the server 
sybase_tarski. 

Example 3.2 

Schema Declaration to external Sybase server, 
database ({ 

Sybase : : employee (NAME : char (30) , SALARY : int , MANAGER : char (30) ) 
from sybase_tarski 
use payroll 

3 Sybase is a trademark of Sybase Inc., Oracle is a trademark of Oracle Inc., DB2 is a trademark 
of IBM Inc. 
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user_name 'john' 
application_name 'downsizing' 

interf ace_f ilename ' /tmp/ldl++/demo/interf aces ' 
password nhoj 

} ). 

The CDC++ system generates SQL queries that off-loads to the external database 
server the computation of (i) the join, select, project queries corresponding to pos- 
itive rule goals, (ii) the set differences corresponding to the negated goals, and (hi) 
the aggregate operations specified in the heads of the rules. 

In the following example the rule defines expensive employees as those who make 
over 75,000 and more than their managers: 

Example 3.3 
SQL Generation 

expensive_employee(Name) <- 

employee (Name , Salaryl, Manager), 
Salaryl > 75000, 
employee (Manager, Salary2, _) , 
Salaryl > Salary2. 

The CDC++ compiler collapses all the goals of this rule and transforms it into the 
following SQL node: 

expensive_employee(Name) <- sql_node (Name) . 

where sql_node denotes the following SQL query sent to external database server: 

SELECT employee_0 . NAME 

FROM employee employee_0, employee employee_l 
WHERE employee_0 . SALARY > 75000 AND 

employee_l .NAME = employee_0 . MANAGER AND 

employee_0. SALARY > employee_l . SALARY 

Consequently, access to the external database via CD C++ is as efficient as for 
queries written directly in SQL. Rules with negated goals are also supported and 
implemented via the NOT EXIST construct of SQL. The CDC++ SQL interface 
also supports updates to external databases, including set-oriented updates with 
qualification conditions. Updates to external relations follow the same syntax and 
semantics as the updates to local relations. The execution of each query form is 
viewed as a new transaction: either it reaches its commit point or the transaction 
is aborted. 

To better support middleware applications, the coupling of CDC++ with external 
databases was further enhanced as follows: 

• Literal Collapsing: The goals in the body of a rule are reordered to ensure 
that several goals using database relations can now be supported as a single 
SQL subquery to be offloaded to the DBMS. 
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• Rule compression: To offload more complex and powerful queries the remote 
database, literals from multiple levels of rules are combined and the rules are 
compressed vertically. 

• Aggregates: Rules that contain standard SQL aggregates in their heads can 
also be offloaded to the remote SQL system. 



3-4 Procedural Language Interface 

As shown in Figure [|, the CD C++ system is designed to achieve an open archi- 
tecture where links with procedural languages, such C/C++, can be established in 
two ways: 

• Via the Application Programming Interface (API) which allows applications 
to drive the system, and 

• Via the External Predicate Manager which allows C/C++ functions to be 
imported into the inference engine as external predicates. 

Via the API, any C/C++ routine can call the CDC++ inference engine. The 
API provides a set of functions that enable applications to instruct the CDC++ 
engine to load a schema, load rules, compile query forms, send queries, and retrieve 
results. 

Via the external predicate manager, function defined in C/CH — h can be imported 
into CDC++ and treated as logical predicates callable as rule goals. A library of 
C/C++ functions is also provided to facilitate the manipulation of internal CD C++ 
objects, and the return of multiple answers by the external functions. Therefore, 
external functions can have the same behavior as internal predicates in all aspects, 
including flow of control and backtracking. Details on these interfaces can be found 



in (Zaniolo et aL 1998) 



4 Applications 

The deployment of the CDC and CDC++ prototypes in various real-life applica- 
tions have much contributed to understanding the advantages and limitations of 



deductive databases in key application domains ( Lsur, 1990a ; Tsur, 1990b ). More- 
over, this experience with application problems, has greatly influenced the design 
of the CDC-\ — h system and its successive improvements. 

Recursive Queries. Our first focus was to compute transitive closures and to solve 



various graph problems requiring recursive queries, such as Bill-of-Materials (Zan 



iolo et ai, 1997). Unfortunately, many of these applications also require that set- 
aggregates, such as counts and minima, be computed during the recursive traversal 
of the graph. Therefore, these applications could not be expressed in CDC which 
only supported stratified semantics, and thus disallowed the use of negation and 
aggregation within recursive cliques. Going beyond stratification thus became a 
major design objective for CD C++. 
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Rapid Prototyping of Information Systems. Rapid prototyping from E-R specifica- 
tions has frequently been suggested as the solution for the productivity bottleneck 
in information system design. Deductive databases provide a rule-based language 
for encoding executable specifications, that is preferable to Prolog and 4GL sys- 
tems used in the past, because their completely declarative semantics provides a 
better basis for specifications and formal methods. Indeed, CDC proved to be the 
tool of choice in the rapid Prototyping of Information Systems in conjunction with 



a structured-design methodology called POS (Process, Object and State) ( Ackley 



et al, 199C ; Tryon, 1991 ) . Our proof-of-concept experiment confirmed the great po- 



tential of deductive databases for the rapid prototyping of information systems; but 
this also showed the need for a richer environment that also supports prototyping 
of graphical interfaces, and the use of E-R based CASE tools. A large investment in 
producing such tools is probably needed before this application area can produce a 
commercial success for deductive databases. 



Middleware At MCC, CDC++ was used in the CARNOT /INFOSLEUTH project 
to support semantic agents that carry out distributed, coordinated queries over a 
network of databases ( png et al, 1995| ). In particular, CD C++ was used to imple- 
ment the ontology-driven mapping between different schemas; the main functions 
performed by CDC-\ — h include (i) transforming conceptual requests by users into a 
collection of cooperating queries, (ii) performing the needed data conversion, and 
(iii) offloading to SQL statements executable on local schemas (for both relational 
and 0-0 databases). 



Scientific Databases The CD C++ system provided a sound environment on which 
to experiment with next-generation database applications, e.g., to support domain 
science research, where complex data objects and novel query and inferencing ca- 
pabilities are required. 

A first area of interest was molecular biology, where several pilot applications re- 
lating to the Human Genome initiative ( |Erickson, 1992] ) were developed ( |Overbeek 
et al, 1990| ; [Tsur et al, 199C| ). CD C++ rules were also used to model and support 
taxonomies and concepts from the biological domain, and to bridge the gap be- 
tween high-level scientific models and low-level experimental data when searching 
and retrieving domain information ( Tsur, 1990b| ) . 

A second research area involves geophysical databases for atmospheric and cli- 
matic studies (Muntz et al, 1995). For instance, there is a need for detecting and 
tracking over time and space the evolution of synoptic weather patterns, such as 
cyclones. The use of CD C++ afforded the rapid development of queries requiring 
sophisticated spatio-temporal reasoning on the geographical database. This first 
prototype was then modified to cope with the large volume of data required, by 
off-loading much of the search work to the underlying database. Special constructs 
and operators were also added to express cyclone queries ( Muntz et al, 1995| ). 



Knowledge Discovery and Decision Support Applications The potential of the CDC-\ — h 
technology in this important application area was clear from the start (Naqvi & 



The Deductive Database System CVC-\ — h 



29 



Tsur, 1989), when our efforts concentrated on providing the analyst with powerful 



tools for the verification and refinement of scientific hypotheses (Tsur, 1990a). In 
our early experiments, the expert would write complex verification rules that were 
then applied to the data. CDCA — h proved well-suited for the rapid prototyping of 



these rules, yielding what became known as the 'data dredging' paradigm (Tsur 



1990a) 



A more flexible methodology was later developed combining the deductive rules 
with inductive tools, such as classifiers or Bayesian estimation techniques. A pro- 
totype of a system combining both the deductive and inductive methods is the 



"Knowledge Miner" (ghen et ai, 1994), which was used in the discovery of rules 



from a database of chemical process data; £VC-\ — h meta predicates proved very 



useful in this experiment (Shcn et ai, 1996) 



Other experiments demonstrated the effectiveness of the system in performing 



important auxiliary tasks, such as data cleaning (Tsou et ai, 1992; Shcth et 



1995 ). In these applications, the declarative power of CVC++ is used to specify 
the rules that define correct data. These allow record-by-record verification of data 
for correctness but also the identification of sets of records, whose combination 
violates the integrity of the data. Finally, the rules are used to clean (i.e., correct) 
inconsistent data. This capability can either be used prior to the loading of data 
into the database, or during the updating of the data after loading. This early 
investigations paved the way for a major research project discussed next focusing 
on using £T>£-\ — h in datamining applications . 

Developing Data Mining Applications The results of extensive experiences with an 



CDC-\ — h based environment for knowledge discovery were reported in (Giannott 
et ai, 1999] ; Bonchi et ai, 1999 ). The first study ( Giannotti et ai, 1999] ) describes 



the experience with a fraud detection application, while the second one reports 



on a marketing application using market basket analysis techniques ( Bonchi et al 



1999). In both studies, CD C++ proved effective at supporting the many diverse 



steps involved in the KDD process. In ( Bonchi et al, 1999 ), the authors explain the 



rationale for their approach and the reasons for their success, by observing that the 
process of making decisions requires the integration of two kinds of activities: (i) 
knowledge acquisition from data (inductive reasoning), and (ii) deductive reasoning 
about the knowledge thus induced, using expert rules that characterize the specific 
business domain. Activity (i) relies mostly on datamining functions and algorithms 
that extract implicit knowledge from raw data by performing aggregation and sta- 
tistical analysis on the database. A database-oriented rule-based system, such as 
CD C++., is effective at driving and integrating the different tasks involved in (i) 
and very effective in activity (ii) where the results of task (i) are refined, inter- 
preted and integrated with domain knowledge and business rules characterizing the 
specific application. 

For instance, association rules derived from market basket analysis are often too 
low-level to be directly used for marketing decisions. Indeed, market analysts seek 
answers to higher-level questions, such as "Is the supermarket assortment adequate 
for the company's target customer class?" or "Is a promotional campaign effective in 
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establishing a desired purchasing habit in the target class of customers?" . CD C++ 
deductive rules were used in (Bonchi et at, 1999) to drive and control the overall 
discovery process and to refine the raw association rules produced by datamining 
algorithms into knowledge of interest to the business. For instance, CD C++ would 
be used to express queries such as "Which rules survive/decay as one moves up 
or down the product hierarchy?" or "What rules have been effected by the recent 
promotions" ( Bonchi et al., 1999 ). 
The most useful properties of CDC- 



mentioncd in these studies ( Giannott 



et al, 1999| ; |Bonchi et al, 1999| ; |Giannotti et al, 2001af ) were flexibility, capability to 
adapt to the analyst's needs, and modularity, i.e., the ability to clearly separate the 
different functional components, and provide simple interfaces for their integration. 
In particular, the user defined aggregates described in Section 2.2 played a pivotal 
roles in these datamining applications since datamining functions (performing the 
inductive tasks) were modelled as user-defined aggregates which could then be con- 



veniently invoked by the CD C++ rules performing the deductive tasks ( Giannott 



et al., 2001a ) . The performance and scalability challenge was then addressed by en- 
coding these user-defined aggregates by means of CDC++ procedural extensions, 
and, for database resident data, offloading critical tasks to the database system 
containing the data (Giannotti et al., 2001a). 



Lessons Learned The original motivations for the development of the original CDC 
system was the desire to extend relational query languages to support the devel- 
opment of complete applications, thus eliminating the impedance mismatch from 
which applications using embedded SQL are now suffering. In particular, data inten- 
sive expert systems were the intended 'killer' applications for CDC. It was believed 
that such applications call for combining databases and logic programming into 
a rule-based language capable of expressing reasoning, knowledge representation, 
and database queries. While the original application area failed to generate much 
commercial demand, other very promising areas emerged since then. Indeed the 
success of CDC++ in several areas is remarkable, considering that CDC++ is suf- 
fering from the combined drawbacks of (i) being a research prototype (rather than 
a supported product), and yet (ii) being subject to severe licensing limitations. 
Unless the situation changes and these two handicaps are removed, the only op- 
portunities for commercial deployments will come from influencing other systems; 
i.e., from system that borrow the CD C++ technology to gain an edge in advanced 
application areas, such as datamining and decision support systems. 



5 Conclusion 



Among the many remarkable projects and prototypes ( Ramakrishnan & Ullman 



1995) developed in the field of logic and databases (Minker, 1996), the CDC/ CV C++ 
project occupies a prominent position because the level and duration of its research 
endeavor, which brought together theory, systems, and applications. By all objective 
measures, the CD C++ project succeeded in its research objectives. In particular, 
the nondeterministic and nonmonotonic constructs now supported in CD C++ take 
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declarative logic-based semantics well beyond stratification in terms of power and 
expressivity (and stratified negation is already more powerful than SLD-NF). The 
£DC-\ — (- system supports well the language and its applications. In particular, the 
pipelined execution model dovetails with constructs such as choice and aggregates 
(and incremental answer generation) , while the system's open architecture supports 
tight coupling with external databases, JDBC, and other procedural languages. The 
merits of the CDC++ technology, and therefore of deductive databases in the large, 
have been demonstrated in several pilot applications — particularly datamining ap- 
plications. 

Although there is no current plan to develop CDC++ commercially, there remain 
several exciting opportunities to transfer its logic-oriented technology to related 
fields. For instance, the new query and data manipulation languages for web doc- 
uments, particularly XML documents, bear affinity to logic-based rule languages. 
Another is the extension to SQL databases of the new constructs and non-stratified 
semantics developed for CDC++: in fact, the use of monotonic aggregates in SQL 
has already been explored in (Wang fc Zaniolo, 2000). 
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Appendix I: Aggregates in Logic 



The expressive power of choice can be used to provide a formal definition of aggre- 
gates in logic. Say for instance that we want to define the aggregate avg that returns 



the average of all Y- values that satisfy d(Y). By the notation used in CDC flChimcnt 
et al., 1990| ), CORAL ( Ramakrishnan et al., 1993 ), and CDC++, this computation 



can be specified by the following rule: 



p(avg(Y))^ d(Y). 
A logic-based equivalent for this rule is 

p(Y) <— results(avg, Y). 

where results(avg, Y) is derived from d(Y) by (i) the chain rules, (ii) the cagr rules 
and (iii) the return rules. 

The chain rules are those of Example 3 that place the elements of d(Y) into an 
order-inducing chain. 
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chain(nil, nil). 

chain(X,Y)^- chain(_,X), d(Y), 

choice((X), (Y)), choice((Y), (X)). 

Now, we can define the cagr rules to perform the inductive computation by calling 
the single and multi rules as follows: 

cagr(AgName, Y, New) <— chain(nil, Y), Y ^ nil, single(avg, Y, New). 
cagr(AgName, Y2, New) <- chain(Yl, Y2), cagr(AgName, Yl, Old), 

multi(AgName, Y2, Old, New). 

Thus, the cagr rules are used to memorize the previous results, and to apply 
(i) single to the first element of d(Y) (i.e., for the pattern chain(nil, Y)) and (ii) 
multi to the successive elements. The return rules are as follows: 



results(AgName, Yield) <— chain(Yl, Y2), cagr(AgName, Yl, Old), 

ereturn(AgName, Y2, Old, Yield). 

results(AgName, Yield) <— chain(X, Y), -ichain(Y, _), 

cagr(AgName, Y, Old), 
freturn(AgName, Y, Old, Yield). 

Therefore, we first compute chain, and then cagr that applies the single and 
multi rules to every element in the chain. Concurrently the first results rule 
produces all the results that can be generated by the application of the ereturn 
rules to the element in the chain. The final returns are instead computed by the 
second results rule that calls on the freturn rules once the last element in the 
chain (i.e., the element without successors) is detected. The second results rule is 
the only rule using negation; in the absence of freturn this rule can be removed 
yielding a positive choice program that is monotonic by Theorem 2. Thus, every 
aggregate with only early returns is monotonic with respect to set containment and 
can be used freely in recursive rules. 



